Wikipedia Arborification and Stratified Explicit Semantic Analysis

نویسندگان

  • Yannis Haralambous
  • Vitaly Klyuev
چکیده

RÉSUMÉ Nous présentons une extension du procédé d’analyse sémantique explicite de Gabrilovich et Markovitch. À l’aide de leur mesure de parenté sémantique, nous pondérons le graphe des catégories de Wikipédia. Puis, nous en extrayons un arbre couvrant minimal par le biais de l’algorithme de Chu-Liu & Edmonds. Nous définissons une notion de tfidf stratifié, les strates étant, pour une page Wikipédia et un terme donnés, le tfidf classique et les tfidfs catégoriels dans les catégories ancêtres, au sens de l’arbre couvrant minimal. Notre méthode se sert de ce tfidf stratifié, qui favorise les termes qui « survivent » lorsque on passe des pages aux catégories, en se dirigeant vers la racine de l’arbre. Nous l’évaluons par une classification de textes tirés du corpus WikiNews, et constatons qu’elle apporte un gain de précision de 18%. Nous terminons par une série de pistes de recherches futures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UPC-CORE: What Can Machine Translation Evaluation Metrics and Wikipedia Do for Estimating Semantic Textual Similarity?

In this paper we discuss our participation to the 2013 Semeval Semantic Textual Similarity task. Our core features include (i) a set of metrics borrowed from automatic machine translation, originally intended to evaluate automatic against reference translations and (ii) an instance of explicit semantic analysis, built upon opening paragraphs of Wikipedia 2010 articles. Our similarity estimator ...

متن کامل

Non-Orthogonal Explicit Semantic Analysis

Explicit Semantic Analysis (ESA) utilizes the Wikipedia knowledge base to represent the semantics of a word by a vector where every dimension refers to an explicitly defined concept like a Wikipedia article. ESA inherently assumes that Wikipedia concepts are orthogonal to each other, therefore, it considers that two words are related only if they co-occur in the same articles. However, two word...

متن کامل

Query Expansion Using Wikipedia and Dbpedia

In this paper, we describe our query expansion approach submitted for the Semantic Enrichment task in Cultural Heritage in CLEF (CHiC) 2012. Our approach makes use of an external knowledge base such as Wikipedia and DBpedia. It consists of two major steps, concept candidates generation from knowledge bases and the selection of K-best related concepts. For selecting the K-best concepts, we ranke...

متن کامل

Wikipedia Link Structure and Text Mining for Semantic Relation Extraction

Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. Since it is becoming a database storing all human knowledge, Wikipedia mining is a promising approach that bridges the Semantic Web and the Social Web (a. k. a. Web 2.0). In fact, i...

متن کامل

Combining Heterogeneous Knowledge Resources for Improved Distributional Semantic Models

The Explicit Semantic Analysis (ESA) model based on term cooccurrences in Wikipedia has been regarded as state-of-the-art semantic relatedness measure in the recent years. We provide an analysis of the important parameters of ESA using datasets in five different languages. Additionally, we propose the use of ESA with multiple lexical semantic resources thus exploiting multiple evidence of term ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012